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Abstract 

Background: Trichomonas vaginalis is the most prevalent non-viral sexually transmitted parasite. Although the 
protist is presumed to reproduce asexually, 60% of its haploid genome contains transposable elements (TEs), known 
contributors to genome variability. The availability of a draft genome sequence and our collection of >200 global 
isolates of T. vaginalis facilitate the study and analysis of TE population dynamics and their contribution to genomic 
variability in this protist. 

Results: We present here a pilot study of a subset of class II Tcl/mariner J^s that belong to the T vaginalis Tvmarl 
family. We report the genetic structure of 19 Tvmarl loci, their ability to encode a full-length transposase protein, 
and their insertion frequencies in 94 global isolates from seven regions of the world. While most of the Tvmarl 
elements studied exhibited low insertion frequencies, two of the 19 loci (locus 1 and locus 9) show high insertion 
frequencies of 1.00 and 0.96, respectively. The genetic structuring of the global populations identified by principal 
component analysis (PCA) of the Tvmarl loci is in general agreement with published data based on genotyping, 
showing that Tvmarl polymorphisms are a robust indicator of T vaginalis genetic history. Analysis of expression of 
22 genes flanking 13 Tvmarl loci indicated significantly altered expression of six of the genes next to five Tvmarl 
insertions, suggesting that the insertions have functional implications for T vaginalis gene expression. 

Conclusions: Our study is the first in T vaginalis to describe Tvmarl population dynamics and its contribution to 
genetic variability of the parasite. We show that a majority of our studied Tvmarl insertion loci exist at very low 
frequencies in the global population, and insertions are variable between geographical isolates. In addition, we 
observe that low frequency insertion is related to reduced or abolished expression of flanking genes. While low 
insertion frequencies might be expected, we identified two Tvmarl insertion loci that are fixed across global 
populations. This observation indicates that Tvmarl insertion may have differing impacts and fitness costs in the 
host genome and may play varying roles in the adaptive evolution of T vaginalis. 

Keywords: DNA transposable element, Mariner transposase. Trichomonas vaginalis, Population genetics. Gene 
expression 



Background 

Transposable elements (TEs) are mobile genetic units that 
exhibit broad diversity in their structure and transposition 
mechanisms. They are present in many eukaryotic ge- 
nomes and their movement and accumulation represent a 
major force shaping the genes and genomes of almost all 
organisms. TEs are typically divided into two classes: class 
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I that transposes via an RNA intermediate; and class II 
that transposes via a DNA intermediate. There are three 
major subclasses of class II DNA transposons: 1) those 
that excise as double-stranded DNA and reinsert else- 
where in the genome, that is, the classic cut-and-paste' 
transposons [1]; 2) those that utilize a mechanism prob- 
ably related to rolling-circle replication, such as Helitrons 
[2,3]; and 3) Mavericks^ whose mechanism of transposition 
is not yet well understood, but that likely replicate using a 
self-encoded DNA polymerase [4]. Special characteristics 
of cut-and-paste transposons include a central protein 
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coding region that encodes a transposase that is required 
for transposition flanked by (with a few exceptions) ter- 
minal inverted repeats (TIRs). 

While the contribution of TEs to the genomes of many 
higher eukaryotes has been well described, we know little 
about TE presence and influence in the genomes of single- 
celled eukaryotes. Parasitic protists have compact genomes 
(approximately 10 to 80 Mb [5]), and contain from none 
(for example, the malaria parasite Plasmodium falciparum) 
to >14% (for example. Entamoeba histolytica) TE repeats 
that are correlated with their haploid genome sizes [6,7]. 
However, unlilce other parasites of human health import- 
ance, the sexually transmitted parasite Trichomonas vagina- 
lis has a large genome of approximately 160 Mb, two thirds 
of which consists of TE repeats, predominantly class II 
DNA transposons [1,4,8,9]. Recent studies indicate that the 
large genome size of T, vaginalis can be largely accounted 
for by the massive amplification of Maverick TEs [4] that 
are present in approximately 3,000 copies in the genome. 
The average size of Maverick elements in T, vaginalis is 15 
to 20 Kb, thus they probably occupy approximately 60 Mb 
(37%) of the 160 Mb genome. Their likely signiflcant im- 
pact on genome dynamics has been hypothesized [4]. 

In addition to Mavericks^ a family of DNA transpo- 
sons belonging to the ubiquitous Tcl/mariner super- 
family, Tvmarl, is present in the T, vaginalis genome in 
over 1,000 copies. Tvmarl was the first representative 
of a mariner family member to be found in a protist, 
and is one of only a handful of active mariner elements 
found in any species [10]. The Tvmarl family is highly 
specific to T, vaginalis since very closely related homo- 
logs could not be detected by Southern blot in other 
species of trichomonad such as Trichomonas tenax and 
Pentatrichomonas hominis [10], suggesting recent ac- 
quisition in the T, vaginalis lineage. Thus the Tvmarl 
family may play an important role in T, vaginalis speci- 
ation and adaptation [10]. 

The large size of the T vaginalis genome is thought to 
be due to the high copy number of TE families [1,8,9]. TE 
abundance is correlated with genome size, which is further 
correlated with cell size across different phyla [11-14]. Cell 
size is an important factor for T, vaginalis parasitism as 
the larger the cell, the more surface with which T, vagina- 
lis has to adhere to host epithelium cells, a crucial factor 
in its pathogenicity. Tritrichomonas foetus^ which parasit- 
izes the urogenital tract of cattle, is another trichomonad 
with a similarly large genome (approximately 177 Mb), 
which may be caused by TE load [15]. Although yet to be 
tested, it is tempting to speculate that expansion of TE 
families could be the raw material that provides the vari- 
ation upon which natural selection could act, favoring the 
largest cells [8]. To what extent TEs shape genetic diver- 
sity among T vaginalis isolates and whether the benefits 
of a large genome size are enough to counteract the 



potentially deleterious effects of TE insertions in or near 
host genes is an important question. In this study, we 
aimed to move closer to answering these points by inves- 
tigating the abundance and distribution of a subset of 
19 Tvmarl loci in 94 global isolates of T vaginalis. In 
addition we sought to determine the effect of Tvmarl in- 
sertions on host gene expression and the functional impli- 
cations of such insertions. 

Results 

Characterization of Tvmarl elements in the 7. vaginalis 
genome 

Approximately 1,000 Tvmarl elements are currently an- 
notated in the G3 reference genome, although many ap- 
pear fragmented due to an incomplete assembly caused 
by the repetitive nature of the T vaginalis genome. To 
identify complete Tvmarl elements (defined as those 
that contain no ambiguous base caUs and are flanked by 
3' and 5' TIRs [16]) for use in this study, we screened 
the reference genome in TrichDB [17]. A total of 408 in- 
tact elements were identified and their DNA sequences 
aligned for characterization (data not shown). The se- 
quences were found to be highly similar, with an average 
pairwise difference of 0.006 and a mode length identical 
to the consensus sequence of 1,304 bp. We classified the 
408 elements as putatively autonomous (those that re- 
tain the ability to encode a transposase protein identical 
in amino acid sequence to the consensus) or nonautono- 
mous (derivatives of autonomous elements that have ac- 
quired disruptive mutations in the transposase open 
reading frame (ORE), such that they can still be activated 
in the presence of a transposase transcribed from an au- 
tonomous element). Of the 408 elements, 33 were classi- 
fied as autonomous. The remaining 375 contained at 
least one nonsynonymous substitution within the trans- 
posase ORE (184 elements), or an indel or nonsense muta- 
tion that truncated the transposase ORE (191 elements). It 
is worth noting that these nonsynonymous substitutions 
may not interfere with the function of the transposase 
protein, and thus the number of autonomous elements 
may be higher. 

Tvmarl exhibits insertion polymorphism 

In order to test our hypothesis that Tvmarl TEs exhibit 
insertion polymorphism, we designed a PGR assay that 
would identify the presence or absence of Tvmarl inser- 
tions in the genomes of a wide variety of T vaginalis 
isolates (Methods). The T vaginalis reference genome 
assembly was screened and unique oligonucleotide 
primers identified that flanked a subset of 19 full-length 
Tvmarl elements, approximately 5% of the total full- 
length elements found in the reference genome se- 
quence. The elements were selected at random and the 
only criterion for their selection was the presence of 
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sufficient unique flanking sequence in order that we 
could design element-specific primers. The SNP calls 
and indels of the 19 Tvmarl elements were confirmed 
by Sanger sequencing. A file showing the SNP calls and 
indels is shown (see Additional files 1 and 2). Of the 19 
elements, eight are putatively nonautonomous, with 
three containing indels that truncate the transposase 
ORF, and five elements containing nonsynonymous mu- 
tations within the transposase. The remaining eleven ele- 
ments encode a transposase protein identical to the 
consensus sequence and are putatively autonomous. 
Each of the elements was assigned a locus number, and 
this subset of elements is referred to in the remainder of 
this study. Details of the 19 loci used in the study are 
provided (see Additional file 3). 

We performed our PGR assay on 94 global T, vaginalis 
isolates from: the United States of America (USA, total 
41), Mexico (MEX, total 10), Italy (ITA, total 5), South 
Africa (SAP, total 10), Australia (AUS, total 7) and Papua 
New Guinea (PNG, total 13), and several commonly 
used laboratory strains (LST, total 8). A file describing 
the isolates used in this study is provided (see Additional 
file 4). For each isolate, a PGR reaction using the unique 
flanking primers was performed, followed by a second 
PGR reaction with an internal transposon primer to con- 
firm the result. A table of insertion genotypes by locus 
and isolate is shown (see Additional file 5). Only known 
Tvmarl insertions identified in the G3 reference strain 
were screened for in other strains. From a total of 1,877 



PGRs performed in duplicates, 376 insertions were 
found, with an overall PGR success rate of 87.2% (lowest 
success rate for any locus, 67.4%; highest success rate for 
any locus, 98.9%). As anticipated, due to the potentially 
deleterious nature of transposon insertions in gene-rich 
regions, most Tvmarl elements exhibited a low level of 
insertion in a majority of the isolates (Figure lA). In 14 
out of 19 loci, less than 25% of isolates had an insertion, 
and in 16 out of 19 loci, less than 50% of isolates had an 
insertion. Locus 18 was found to be present in the refer- 
ence G3 strain only and no other isolates. 

In particular, loci 1, 9 and 13 showed a high degree of 
insertion, with locus 1 and locus 9 being fixed or close 
to fixation in all isolates (Figure IB). A second group 
consisting of loci 4, 7, 8 and 14 showed intermediate fre- 
quencies in some of the populations. A third group con- 
sisting of loci 2, 3, 10, 17 and 18 showed no insertions in 
most of the populations. The total insertion frequency in 
the SAF population was the highest (36.1%), followed by 
the USA population with 25.9%. 

Population structure and differentiation of T, vaginalis 
isolates revealed by Tvmarl as a genetic marker 

Understanding the population dynamics of the Tvmarl 
elements in T, vaginalis will provide important clues as 
to the evolutionary history of the family. Towards this 
goal, we next compared the population genetic structur- 
ing of T, vaginalis determined using Tvmarl as a genetic 
marker with the results of our previous population 
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Figure 1 Tvmarl allele frequency spectrum and insertion frequencies. (A) Tvmarl allele frequency spectrum in a panel of 94 7. vaginalis 
isolates. The x-axis represents frequencies, while the y-axis represents counts. (B) Tvmarl insertion frequencies in a panel of 94 7. vaginalis isolates. 
The gradient of Tvmarl frequencies (from 0.0 to 1.0) identified in different isolates is represented as a heatmap, with colors indicating the 
following: blue represents no insertion; white represents intermediate frequencies; and red represents fixation or elements that are close to 
fixation. Two dendrograms are shown. The top dendrogram represents the clustering of loci based on their similarity by frequency. The 
dendrogram on the left represents clustering based on the frequency of all the loci used in the study. AUS, Australia; ITA, Italy; LST, laboratory 
strains; MEX, Mexico; PNG, Papua New Guinea; SAF, South Africa; USA, United Sates of America. 
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genetic analyses using a panel of validated microsatellite 
and SNP markers and a similar set of global isolates 
[18,19]. Briefly, these previous studies revealed high genetic 
diversity of T, vaginalis, and two genetic 'types' or lineages 
that show an unusual population structure in that they are 
distributed in near-equal frequencies worldwide [19]. 

Similar to our previous findings, no geographical popu- 
lation differentiation of T, vaginalis isolates was found 
using Tvmarl as a genetic marker. However, principal 
component analysis (PCA) of the Tvmarl insertion fre- 
quencies and use of the Bayesian clustering program 
STRUCTURE [20] to determine the number of optimal K 
populations according to Tvmarl frequencies at each 
locus, identified the two distinct T, vaginalis lineages (type 
1 and type 2) identified by previous studies (Figure 2). 
We tested significance of this two-type population 
structuring by using hierarchical analysis of molecular 
variance (AMOVA), and found that applying no group- 
ings to the populations was found to be significant 
{P <0.000001), and grouping populations based on their 
genetic type as suggested by the PCA results and contrast- 
ing the SAF group versus all other groups was also signifi- 
cant {P <0.000001). Details of AMOVA structuring and 



significance test are shown (see Additional file 6). Overall, 
these results show that there is evidence of two genetically 
distinct groups, as previously identified [19]. Tvmarl ele- 
ments reflect the genetic history of their host genome, simi- 
lar to standard genetic markers, and support the existence 
of the two T vaginalis genetic types identified previously. 

Tvmarl insertion is associated with changes in host gene 
expression 

Next we undertook experiments to determine the effect 
of Tvmarl insertions on the expression of nearby T 
vaginalis genes. Quantitative reverse transcription PCR 
(qRT-PCR) was used to quantitate the mRNA expression 
of a total of 22 genes flanking 13 of the 19 Tvmarl inser- 
tion sites. A total of 34 genes that could be identified 
flanking the 19 Tvmarl loci in the genome annotation 
were initially analyzed, but genes flanking six loci (loci 1, 
2, 7, 9, 14 and 18) had to be discarded for various rea- 
sons including lack of primer specificity. From ten to 15 
isolates consisting of insertion positive and insertion 
negatives for the 13 Tvmarl loci were analyzed in dupli- 
cate for expression of each of the genes flanking the loci, 
a total of 624 qRT-PCR reactions. A list of primers used 
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Figure 2 T. vaginalis population structure based on 19 Tvmarl insertions. (A) Principal component analysis (PCA) using seven geographical 
groups. A shape represents each isolate and color represents genetic type 1 and 2. The first three PCA axes explain the highest percentage of the 
variation (60.81% first, 18.94% second and 10.53% third). (B) Structure analysis of T. vaginalis isolates using K=2, determined as the optimal K 
number. Each individual is represented by a thin vertical line, which is partitioned into /(segments that represent its estimated population group 
membership fractions. Black lines separate groups of isolates. AUS, Australia; ITA, Italy; LST, laboratory strains; MEX, Mexico; PCA, principal 
component analysis; PNG, Papua New Guinea; SAF, South Africa; USA, United Sates of America. 



Bradic et at. Mobile DNA 2014, 5:12 
http://www.nnobilednajournal.conn/content/5/1/12 



Page 5 of 10 



to determine the expression of Tvmarl flanking genes is 
provided (see Additional file 7). 

In the majority of cases, gene expression was reduced 
in insertion positive isolates, that is, those parasites 
that contained a Tvmarl element close to the gene ana- 
lyzed (Figure 3). In particular, six genes (TVAG_020430, 
TVAG_256120, TVAG_314970, TVAG_340930, TVAG_ 
340940 and TVAG_250350) flanking flve Tvmarl loci (loci 
5, 6, 8, 10 and 15) showed signiflcant reduction or abolition 
of relative mRNA levels in insertion positive isolates com- 
pared with insertion negative isolates. The differences be- 
tween the insertion positive and insertion negative groups 
were not signiflcant for 12 genes, and interestingly four 
genes (TVAG_372050, TVAG_260330, TVAG_455510 and 
TVAG_130220) showed increased expression in insertion 
positive isolates, although the differences were also not sig- 
niflcant (Figure 3). Thus, insertion of a Tvmarl element 
close to a gene appears to affect that gene s transcription. 

To examine if this apparent effect of Tvmarl insertion 
on T vaginalis gene expression correlates with distance 
from the insertion site and whether the insertion is 5 ' or 
3' to the gene, we plotted relative mRNA levels with 



distance from the Tvmarl locus for insertion positive 
isolates (Figure 4). We found that mRNA expression is 
positively correlated with increasing distance from the 
Tvmarl locus for genes that have a Tvmarl insertion lo- 
cated 5' to them (Figure 4A, B), an effect that is not 
seen in the absence of a Tvmarl element (Figure 4C, D). 

Discussion 

Tvmarl insertion variation in the genome of 7. vaginalis 

Despite the major contribution that TEs make to the 
genome size and gene complement of T, vaginalis^ little 
is known about how these sequences have shaped the 
evolution of the parasite s genome. Indeed, relationships 
between TE load and parasitic life-cycle are largely 
understudied [21]. Intracellular parasites usually have a 
low TE load or no TEs at aU, perhaps due to the evolu- 
tionary pressure to maintain a smaU ceU size and thus a 
smaU genome [22,23]. For example, pathogenicity of the 
extraceUular parasite Entamoeba histolytica has been 
correlated with genome differences caused by TE inser- 
tions, when comparing pathogenic E, histolytica with 
non-pathogenic Entamoeba dispar [24]. T vaginalis is 
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Figure 3 mRNA levels of genes flanking Tvmarl loci in insertion positive and insertion negative isolates. Box plot showing relative mRNA levels 
for 22 genes flanking 13 Tvmorl loci from insertion positive (grey, left side) and insertion negative (white, right side) isolates. The genes have been ordered by 
decreasing mean difference in mRNA levels between insertion positive and insertion negative groups. A single asterisk above a box plot pair indicates a lvalue of 
<0.05, while a double asterisk indicates a P value <0.01 (Welch Two Sample t-test). The gene examined and its associated Jvimorl locus is indicated on the x-axis 
(the tag WAG' has been removed due to space constraints). Five outlier data points have been excluded from the insertion positive groups of WAG_020450, 
WAG_256130, WAG_340930 and WAG_419760, to allow for higher resolution on the y-axis; however, these outliers are included in the statistical analyses. 
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Figure 4 Distance and orientation plots for Tvmarl insertions and gene expression. Linear regression plots of relative mRNA levels of 22 genes in 15 
isolates plotted against distance in base pairs (bp) between a Tvmarl insertion site and its nearest gene. The plots are divided into relative mRNA levels of proximal 
genes with Tvmorl insertions located: (A) 5' and (B) 3' to the gene in insertion positive isolates; and (C) 5' and (D) 3' to the gene in insertion negative isolates. 



an extracellular parasite for which it has been hypothesized 
a large cell size may be highly beneficial [1,8,9]. Here, we 
sought to understand the contribution that one major 
group of TEs, the T. vaginalis Tcl/mariner family Tvmarl, 
might play in shaping the evolution of this genome and the 
functional consequences this may have for parasite fitness. 

Our initial studies looked at the presence of 19 
Tvmarl loci in 94 global T, vaginalis isolates. A high 
level of polymorphism in the point of insertion was 
identified, with most loci showing low insertion frequen- 
cies in isolates from all geographic regions. Only two 
loci showed a high insertion rate: the first insertion 
(locus 1) was present in all 94 global isolates studied, 
that is, it was fixed; and the second (locus 9) had an in- 
sertion frequency of 96.2%, compared to an average in- 
sertion frequency of 24.7% for the other 17 loci in all 
isolates. This unusually high rate of insertion for two loci 
may be explained by two possible hypotheses. First, the 



insertions could provide potential genetic novelty and 
potential adaptive value for the parasite, as has been dem- 
onstrated for TE insertions in other studies [25-28]. Sec- 
ond, these insertions may have a neutral effect on the 
surrounding genome, and thus have reached fixation due 
to genetic drift. These loci present intriguing candidates 
for further investigation to determine why we see close to 
fixation/fixed TE insertions in T, vaginalis. In addition, 
the presence of so many polymorphic insertions in our 
data provides strong support for previous suggestions that 
Tvmarl is still an active mariner element and recently 
amplified in the T, vaginalis genome [10]. 

Using the Tvmarl insertions as a form of genetic 
marker, we found that the diversity of insertions was 
high for most of the loci and partitioned within and be- 
tween geographical regions. The genetic groups (type 1 
and type 2) identified using PCA and STRUCTURE 
clustering were in agreement (with some exceptions that 
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did not represent the same genetic type) with published 
data based on SNPs and microsatellites [19]. A table of 
comparative parasite genetic types as determined by 
microsatellite and TE polymorphism data is shown 
(Additional file 8). This indicates that Tvmarl polymor- 
phisms reflect the genetic history of T, vaginalis. It was 
also apparent that the SAP (South African) population 
may differ more in its origin from other populations and 
that it may represent an older, distinct lineage as sug- 
gested in our previous work [19]. However, these SAP 
isolates were collected from just two clinics, which may 
have introduced sample bias in this collection of isolates. 

Tvmarl affects host gene expression 

We provide the first evidence that TPs of the mariner 
family affect gene expression in T, vaginalis. We found 
six genes that exhibited significantly reduced or abol- 
ished mRNA levels in the presence of a proximal 
Tvmarl insertion, and furthermore, we found that gene 
expression is positively correlated with increasing dis- 
tance from Tvmarl elements. As the annotations of 
these six genes give little information as to their func- 
tion, it is hard to predict the consequences of altered 
gene expression on the parasites. These pilot data pave 
the way for a larger genome-wide analysis of the effect 
of Tvmarl on the T, vaginalis genome. 

TPs can interrupt gene expression by inserting directly 
into the coding region of a gene, or into a gene s regula- 
tory sequences, or by inducing epigenetic silencing of 
nearby genes [29,30]. Our results provide clues as to the 
mechanism of decreased gene expression for certain 
genes flanking Tvmarl elements. Pirst, the ORP of one 
gene (TVAG_340940) is interrupted by a Tvmarl inser- 
tion at its 3 ' and shows abolished expression in insertion 
positive isolates, suggesting that, as might be expected, 
Tvmarl elements can disrupt gene expression by insert- 
ing directly into coding regions. Second, we found that 
the correlation between distance from a Tvmarl inser- 
tion and gene expression was only significant when the 
insertion is located 5' to the gene. In addition, signifi- 
cant changes in gene expression were observed in genes 
located up to 4.7 Kb (for example, TVAG_020430) from 
the nearest Tvmarl element, while many genes that do 
not show significant changes were located relatively 
close (for example, within 36 bp for TVAG_250360). 
These findings suggest that Tvmarl insertions can also 
affect the expression of nearby genes, and that it is not 
the proximity to the element that is the major determin- 
ate of gene downregulation, but rather the interruption 
of specific cis-regulatory sequence (s). 

The effect of Tvmarl insertions on transcription 
shown here suggests that some Tvmarl insertions may 
be selected against and actively purged from the genome 
due to their influence on gene function and their effect 



on host fitness. However, it is important to note that 
there is likely very little neutral space' in the 177 Mb T, 
vaginalis assembly, which contains a predicted set of ap- 
proximately 60,000 protein coding genes of average size 
929 bp and gene density of 2,956 bp (that is, one gene is 
found every 3 Kbp) [8]. If there is little 'neutral' space in 
the genome for Tvmarl insertions to land, then many of 
them may have deleterious effects, because they will in- 
variably disrupt genes or their regulatory sequences. 
Moreover, the set of insertions that we analyzed here 
is biased towards gene-damaging insertions, because 
unique flanking sequences were required for PGR ampli- 
fication, and Tvmarl insertions found in non-unique re- 
gions of the genome could not be assayed with these 
methods. 

In addition to purifying selection that purges Tvmarl in- 
sertions from the T, vaginalis genome, other host mecha- 
nisms or self-regulation could regulate Tvmarl load and 
activity in this unicellular eukaryote. So far none of the 
host defense mechanisms against TPs (methylation, chro- 
matin modification, RNAi pathways or DNA editing) 
known to regulate TP insertion in other organisms [31,32] 
have been described as functional in T, vaginalis [33] . The 
presence of RNAi homologs in the genome [1,8,9] and 
miRNA candidates [34,35] suggests that T, vaginalis may 
have an RNAi pathway and thus a potential regulatory 
mechanism for TP insertion [36]. More studies are needed 
to explore the roles of these epigenetic mechanisms in T. 
vaginalis Tvmarl dynamics and their functional implica- 
tions for host gene expression. 

Clearly, mariner element transposition has functional 
implications for T, vaginalis and provides an important 
source of genetic variation on which natural selection 
can act and produce adaptation. 

However, other TP families should be examined to elu- 
cidate whether they play a similar important role in shap- 
ing T, vaginalis genome evolution. An important question 
is whether the fate of T, vaginalis is doomed due to its ex- 
treme TP load and a presumed asexual lifestyle, or it has 
sufficient mechanisms to purge deleterious TP insertions 
and thus create a balance between TP gain and loss, allow- 
ing for the evolution of novelty and adaptation. Our future 
goals will be to understand which conditions promote 
Tvmarl mobilization, and how we can harness this extra- 
ordinary burden as a tool for gene transfection and silen- 
cing, and ultimately to control T, vaginalis infections. 

Conclusions 

Our study is the first to provide evidence of mariner 
element dynamics and the contribution of this transpos- 
able element family to the genetic variability of the pro- 
tist T, vaginalis. Here we have shown that Tvmarl 
insertion sites are polymorphic in a set of 94 global T, 
vaginalis isolates, and that the majority of insertions are 
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present at low frequenq^. While the finding of low Tvmarl 
insertion frequency is not unexpected due to the known 
deleterious effect of TEs, we nevertheless identified two 
fixed Tvmarl loci in the 94 isolates. In addition, we ob- 
served that in several instances low frequency insertions 
are related to either reduced or abolished mRNA expres- 
sion of nearby T, vaginalis genes. This observation indi- 
cates the potential impact of Tvmarl on the adaptive 
evolution of T, vaginalis^ and underlines the importance 
of further work on the TE burden of the parasite. 

Methods 

Source of 7. vaginalis isolates and DNA 

The 19 Tvmarl elements used in this study were previ- 
ously identified [10] and are annotated in the current an- 
notation of the T, vaginalis G3 strain genome build 1.3 
represented in TrichDB (http://www.trichdb.org). A total 
of 94 T, vaginalis isolates from six different regions of the 
world and characterized as previously described [19] were 
used (Additional file 4). All isolates were cultured in modi- 
fied Diamonds media with the supplement of 10% horse 
serum, penicillin and streptomycin (Invitrogen, Carlsbad, 
CA, USA), and iron solution composed of ferrous ammo- 
nium sulfate and sulfosalicylic acid (Thermo Fisher Scien- 
tific, Waltham, MA, USA) [19]. We used a standard DNA 
phenol-chloroform method for DNA extraction [19]. 

PGR primer design and Tvmarl element amplification 

To find Tvmarl insertions in the global set of 94 isolates, a 
PCR-based assay was designed such that amplification of 
a unique sized band would unambiguously indicate the 
presence or absence of an element. Briefly, the sequence 
of Tvmarl [GenBank:AY282463] was entered into the 
TrichDB BLAST function, and fiall-length matches selected. 
Unique PGR primer pairs were designed using oligonucleo- 
tide calculator software (Oligo Calc: http://www.basic. 
northwestern.edu/biotools/oligocalc.html) to the flanking 
regions either side of the Tvmarl locus in the reference 
genome (Additional file 3) A third and fourth primer were 
designed to internal sequences of the transposase, and these 
were used with a flanking primer to confirm an insertion. 
Primer Intl (5'-AAACTTCTTGGATTGATACGCACCC- 
3') was used with the forward flanking primer and Int2 
(5'-TGTCGGTTTTTTGGGGCGTGAATG-3') with the 
reverse flanking primer, because amplification was im- 
proved in some instances using these different combina- 
tions. PCRs were performed in 96-well plates and the 
reference strain G3 was used as a control in each plate to 
confirm the locus and amplification efficiency of each re- 
action. PGR was performed in 25 \A reaction volumes 
using approximately 25 ng of template DNA, 0.5 (il of 
each primer (10 pmol/(il) and GoTaq® Green Master Mix 
(Promega, Madison, WI, USA) in the following conditions: 
2 minutes at 95°C (1 minute at 95°C, 2 minutes at the 



annealing temperature and 4 minutes at 72°C) x 30, and 
5 minutes at 72°C. PGR products were sized by electro- 
phoresis on a 1% agarose gel with EtBr in a 1 x TAE buffer 
and visualized under UV light. Sanger sequencing (two- 
fold coverage) of all PGR products confirmed the presence 
or absence of Tvmarl elements. Clustal Omega alignment 
[37] of the sequences was used to identify if the element 
retained autonomous element characteristics (an intact 
transposase gene) or if it was non-autonomous (with dele- 
tions and mutations in the transposase gene). 

Population frequencies and structure analysis 

Frequencies of insertions were calculated using PGR 
data for each isolate and customized scripting in R. The 
heatmap of frequency data for geographical origin was 
designed using the heatmap.2 function in the R gplots 
package (http://CRAN.R-project.org/package=gplots). Popu- 
lation structure was examined using a non-model-based 
multivariate analysis method as implemented in the 
adegenet 1.2-2 package using the dudi.pca function 
[38]. In addition, we used the Bayesian clustering pro- 
gram STRUCTURE 2.2 to determine the number of op- 
timal K populations according to Tvmarl frequencies at 
each locus [20]. The program was run repeatedly ten 
times for each of the values (/<r= 1-7) with 1 x 10^ it- 
erations foUowed by 1 X 10^ Markov chain Monte Carlo 
(MCMC) repeats. The optimal K was determined by 
data probabilities calculation (Pr (X | K)) and A/C, the 
rate of change in the log probability of data between 
successive K values, was determined using Evanno 
method and Structure harvester [39,40]. 

We evaluated the relative contribution between groups, 
within groups and within populations using AMOVA [41], 
performed using Arlequin version 3.0 [42] for 20,000 per- 
mutations. We tested the null hypothesis of panmixia (all 
populations together) as well as structuring revealed by 
PCA, using South Africa isolates as a group versus all 
other populations. In addition, a test for geographical 
structuring was performed. 

qRT-PCR analysis 

Specific qRT-PCR primers were designed for 34 genes 
flanking 18 loci; in the case of loci 3, 7, 8, 9 and 17 
unique primers for the most proximal genes could not 
be designed due to the surrounding repetitive sequence 
and thus the second most proximal gene was assayed. 
All primers were first verified using genomic DNA from 
insertion negative strains and amplified fragments veri- 
fied by Sanger sequencing. Twelve genes were discarded 
due to unspecific primer binding identified through 
melting curve analysis or a lack of amplification from 
genomic DNA, or due to the presence of only one iso- 
late in either the insertion positive or negative group, 
thus preventing statistical analysis of data. Total RNA 
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was isolated using TRIzol® Reagent (Invitrogen) [43], 
reverse-transcribed into cDNA using the ImProm-H™ 
Reverse Transcription System (Promega), and qRT-PCR 
analysis was performed using a LightCycler 480 (Roche, 
Basel, Switzerland). Testing of the remaining 22 genes 
was done in duplicate in 20 \A reaction volumes using 
1 \A of cDNA, 2 \A PGR Primer 10 x concentration and 
10 \A of SYBR^ Green Master Mix within the following 
cycling conditions: 5 minutes at 95°G (4 seconds at 95°G, 
6 seconds at 55°G and 20 seconds at 72°G) x 45, and 
melting curve analysis (95°G for 1 second and 55°G for 
1 second). Goronin (TVAG_021420), a known single 
copy gene previously characterized in different T, vagi- 
nalis isolates [18,19], was used as an endogenous control 
against which all gene expression normalizations were 
made. Amplification efficiencies were calculated for each 
primer set using cDNA from an insertion negative strain 
and relative mRNA levels were calculated for each gene 
in each isolate using the PfafQ (2001) method [44]. Iso- 
lates were grouped based on their Tvmarl insertion on 
each locus. Significance of change in gene expression be- 
tween these two groups was calculated by Welch Two 
Sample ^-test using R version 2.13.1 (http://www.R-project. 
org). Linear regression analyses were carried out using R 
version 2.13.1. 

Additional files 



Additional file 1: Alignment of the sequences of 19 Tvmarl 
elements used in this study. (A) Tvmarl DNA alignment showing ITRs 
designated with a red box across the sequences. The transposase ORF is 
underlined. Indels that truncate the transposase ORF are indicated with 
arrowheads and non-synonymous mutations are designated with 
asterisks. ITR, inverted terminal repeat; ORF, open reading frame. 

Additional file 2: Alignment of the sequences of 19 Tvmarl 
elements used in this study. (B) Protein alignment of Tvmarl 
transposase showing truncations and amino acid variation among the 
19 elements. The aspartic acid residues that form the D, D34D catalytic 
motif are indicated with arrowheads. 

Additional file 3: Table of 19 transposable elements used in this 
study. The transposable element locus ID, primers used to amplify the 
loci and the flanking genes of the 19 transposable elements are shown. 

Additional file 4: List of all T. vaginalis isolates used in this study. 

Details of the geographical origin, year of isolation and reference if the 
isolate was previously described are provided. 

Additional file 5: Table of insertion genotypes by locus and isolate. 

The following labeling is used: 1, insertion positive; 0, insertion negative; 
NA, no data available; MIX, both PGR fragments present in the isolate. All 
MIX were treated as positive insertions in our data analysis. 

Additional file 6: AMOVA in 7. vaginalis global isolates for 19 
Tvmarl loci. A result of no structuring was found to be significant 
among the groups (P <0.000001) with 9.96% of the explained variance, 
compared with 72.71% variance among individuals within populations. 
Contrasting the SAF group versus all other groups was significant 
(P <0.000001) and explained approximately 25% of the variance among 
groups, with the largest proportion of the variance among individuals 
within populations (58.70%), which did not show statistical support 
(P <0.05). % VAR, percentage of variation; AMOVA, analysis of molecular 
variance; SAF, South Africa; SS, sum of squares; VC, variance components. 



Additional file 7: Table of qRT-PCR primers used in the study. List of 
primers used to determine mRNA levels of genes flanking Tvmarl loci, 
and their description. 

Additional file 8: Table of comparative parasite genetic types as 
determined by microsatellite and TE polymorphism data. Parasite 
genetic type from microsatellite data in the table is based on 
microsatellite genotyping in Conrad et al. [19], while type from Tvmarl 
insertion data is based on this study. ND, not determined; TE, 
transposable element. 
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